AITopics | Richmond

Collaborating Authors

Richmond

Words or Vision: Do Vision-Language Models Have Blind Faith in Text?

Deng, Ailin, Cao, Tri, Chen, Zhirui, Hooi, Bryan

arXiv.org Artificial IntelligenceMar-3-2025

Vision-Language Models (VLMs) excel in integrating visual and textual information for vision-centric tasks, but their handling of inconsistencies between modalities is underexplored. We investigate VLMs' modality preferences when faced with visual data and varied textual inputs in vision-centered settings. By introducing textual variations to four vision-centric tasks and evaluating ten Vision-Language Models (VLMs), we discover a \emph{``blind faith in text''} phenomenon: VLMs disproportionately trust textual data over visual data when inconsistencies arise, leading to significant performance drops under corrupted text and raising safety concerns. We analyze factors influencing this text bias, including instruction prompts, language model size, text relevance, token order, and the interplay between visual and textual certainty. While certain factors, such as scaling up the language model size, slightly mitigate text bias, others like token order can exacerbate it due to positional biases inherited from language models. To address this issue, we explore supervised fine-tuning with text augmentation and demonstrate its effectiveness in reducing text bias. Additionally, we provide a theoretical analysis suggesting that the blind faith in text phenomenon may stem from an imbalance of pure text and multi-modal data during training. Our findings highlight the need for balanced training and careful consideration of modality interactions in VLMs to enhance their robustness and reliability in handling multi-modal data inconsistencies.

arxiv preprint arxiv, information, vlm, (12 more...)

arXiv.org Artificial Intelligence

2503.02199

Country:

Asia > Singapore (0.04)
North America > United States > Kentucky > Madison County > Richmond (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Government (0.93)
Education (0.93)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Feature-level Malware Obfuscation in Deep Learning

Dillon, Keith

arXiv.org Machine LearningFeb-9-2020

We consider the problem of detecting malware with deep learning models, where the malware may be combined with significant amounts of benign code. Examples of this include piggybacking and trojan horse attacks on a system, where malicious behavior is hidden within a useful application. Such added flexibility in augmenting the malware enables significantly more code obfuscation. Hence we focus on the use of static features, particularly Intents, Permissions, and API calls, which we presume cannot be ultimately hidden from the Android system, but only augmented with yet more such features. We first train a deep neural network classifier for malware classification using features of benign and malware samples. Then we demonstrate a steep increase in false negative rate (i.e., attacks succeed), simply by randomly adding features of a benign app to malware. Finally we test the use of data augmentation to harden the classifier against such attacks. We find that for API calls, it is possible to reject the vast majority of attacks, where using Intents or Permissions is less successful.

arxiv, malware, obfuscated data, (13 more...)

arXiv.org Machine Learning

2002.05517

Country:

North America > United States > Kentucky > Madison County > Richmond (0.04)
North America > United States > Arizona > Maricopa County > Scottsdale (0.04)
North America > Puerto Rico > San Juan > San Juan (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback